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Preface 


This book of proceedings gathers the contributions presented at the 4th 
URV Doctoral Workshop in Computer Science and Mathematics. After the 
successful previous editions in 2014, 2015 and 2016 the fourth edition was held 
in Tarragona (Catalonia, Spain) on November 16th, 2017. It was jointly orga- 
nized by the Security and Privacy research group (CRISES) and the Doctoral 
Program on Computer Science and Mathematics of Security of Universitat 
Rovira i Virgili (URV), which celebrates its 10th anniversary this year. The 
main aim of this workshop is to promote the dissemination of the ideas, meth- 
ods and results that are developed in the Doctoral Thesis of the students of 
this doctorate program, and to promote the knowledge sharing, collaboration 
and discussion between their respective research groups. 

The workshop had two invited talks and fourteen oral presentations. The 
first invited talk was given Prof. Josep Domingo-Ferrer, who is the Academic 
Director of the Serra Hunter Programme (SHP). The SHP is part of the new 
academic staff model that the Government of Catalonia is promoting to re- 
inforce the internationalization of the Catalan universities, with the ultimate 
goal of consolidating Catalonia as the knowledge hub of Southern Europe; 
the talk described the selection process and the career expectations of Serra 
Hunter faculty members. The second invited talk was given by Dr. Sara Ha- 
jian, a former Ph.D. student of the Doctoral Program on Computer Science 
and Mathematics of Security of the URV, who is currently a research scientist 
at the Eurecat Technology Center in Catalonia. Her talk discussed the algo- 
rithmic bias that may happen in decision making based on Big Data, and the 
technical solutions to detect and prevent algorithmic discrimination. 

In this book, the reader will find the contributions of the fourteen Ph.D. 
students that presented their works in the Workshop. Each chapter presents 
the research topic of each student, the goals of the Doctoral Thesis and some 
preliminary results. Contributions were framed in a variety of research lines, 
which include security and privacy in computer systems, artificial intelligence, 
medical informatics, hardware architectures and mathematics. All contribu- 
tions present innovative proposals, methods or applications, with the aim of 


opening new and strategic research lines. The editors and organizers invite 
you to contact the authors for more detailed explanations and encourage you 
to send them your suggestions and comments, which will certainly help them 
in their PhD theses. 

The members of the organizing committee were Dr. Alexandre Viejo, 
Dr. David Sanchez, Dr. Aida Valls (Coordinator of the Ph.D. program), Mr. 
Jesis Manjoén and Mrs. Olga Seg. 

We could not finish without first thanking the invited speakers for such 
interesting talks. Second, we thank all the participants and, especially, the 
students that presented their work in this DCSM workshop. Finally, we 
also want to thank Universitat Rovira i Virgili (URV), the Departament 
d’Enginyeria Informatica i Matematiques (DEIM), and the Escola Técnica 
Superior d’Enginyeria (ETSE) for their support. 


Alexandre Viejo and David Sanchez (Editors) 
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Universal measures of disclosure risk and 
information loss in individual data anonymization 


Nicolas Ruiz * 


Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili 
Tarragona, Spain 
nicolas.ruiz@oecd.org 


Data on individual subjects are increasingly collected and exchanged. By 
their nature, they provide a rich amount of information that can inform sta- 
tistical and policy analysis in a meaningful way. However, due to the legal 
obligations surrounding these data, this wealth of information is often not 
fully exploited in order to protect the confidentiality of respondents. In fact, 
such requirements shape the dissemination policy of micro data at national 
and international levels. The issue is how to ensure a sufficient level of data 
protection to meet releasers’ concerns in terms of legal and ethical require- 
ments, while offering to users a reasonable richness of information. Moreover, 
over the last decade the role of micro data has changed from being the pre- 
serve of National Statistical Offices and government departments to being a 
vital tool for a wide range of analysts trying to understand both social and 
economic phenomena. As a result, more parties, often very heterogeneous in 
their privacy and information requirements, are now involved in micro data 
transactions. This has opened a new range of questions and pressing needs 
about the privacy/information trade-off and the quest for best practices that 
can be both useful to users but also respectful of respondents’ privacy. 

Statistical disclosure control (SDC) research has a rich history in address- 
ing those issues, by providing the analytical apparatus through which the pri- 
vacy/information trade-off can be assessed and implemented. Over the years, 
it has burgeoned in many directions. But streaming from the large variety of 
practical cases that can occur in micro data exchange is the diversity of tech- 
niques available for data anonymization. Such diversity is undoubtedly useful 
but has however one major drawback: a lack of agreement and clarity on the 
appropriate choice of tools in a given context, and as a consequence a lack of 
general view (or at best an incomplete one) across the relative performances 
of the techniques available. 

Moreover, a variety of parties is involved in micro data exchange. Indeed, 
it is natural to assert that across each party different sensitivities to privacy 
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and information prevail. Some may place greater emphasis on the preservation 
of privacy, e.g. typically the data releasers, while others are relatively more 
concerned by the extent to which information is preserved, e.g. typically the 
researchers. Additionally, these sensitivities can differ also within groups, e.g. 
one researcher can have a low sensitivity to information loss and consider a 
release better than no release at all, while another could simply disregard the 
data above a certain threshold of loss set according to his intended use of the 
data. 

A step toward the resolution of such limitations has been recently proposed 
([1]), by establishing that any micro data masking method can be viewed as 
functionally equivalent to a permutation of the original data plus eventually 
a small noise addition. This insight, called the permutation paradigm, un- 
ambiguously establishes a common ground upon which any masking method 
can be gauged. It is independent of the underlying parameters of the mask- 
ing mechanism and the characteristics of the data. Moreover, it presents the 
advantage of being meaningful and easy to grasp and implement, as the only 
necessary and sufficient information for the comparative evaluation of some 
methods, being under different parametrizations and/or different data sets, is 
a distribution of permutation distances. Thus, the permutation paradigm is 
also a tremendous simplifier for data anonymization. 

While this paradigm is not considered by its author as a new anonymiza- 
tion method per se ((1]), it offers the potential to re-interpret all the techniques 
available through the same lens. It remains however to develop a set. of appro- 
priate measures of disclosure risk and information loss based on permutation 
distances. This is the objective of this contribution. 

To recall the permutation paradigm, we use a simple toy example which 
consists (without loss of generality) of five records and three attributes X = 
(X1, X2, X3) generated by sampling N(10, 102), N (100, 402) and N(1000, 20002) 
distributions, respectively. Noise is then added to obtain Y = (Yj, Yo, Y3), 
the three masked version of the attributes, from N(0,52), N(0,202) and 
N(0, 10002) distributions, respectively. One can see that the masking pro- 
cedure generates a permutation of the records of the original data (Figure 
1). 

Now, as long as the attributes’ values of a dataset can be ranked, which is 
obvious in the case of numerical and categorical ordinal attributes, but also 
feasible in the case of nominal ones ([1]), it is always possible to derive a 
dataset Z that contains the attributes X,, X2 and X3, but ordered accord- 
ing to the ranks of Yj, Yo and Y3, respectively, i.e. in Figure 1 re-ordering 
(X1, X2, X3) according to (Yip, Yor, Y3r). This can be done following a post- 
masking reverse procedure. Finally, the masked data Y can be fully reconsti- 
tuted by adding small noises (£1, F2, £3) (small in the sense that they cannot 
re-rank Z while they can still be large in absolute values) to each observation 
in each attribute (Figure 2). 


Title Suppressed Due to Excessive Length 3 


Origins! dataset X Masked dsteset Y 

X; Xa X3 ¥; Y2 Ys 

13 135 3707 8 160 3248 

20 §2 826 20 57 822 

2 123 -1317 -1 122 248 

15 165 2419 18 13% 597 

2 160 -1008 29 164 -1927 
Rank of the original stribute Rank of the masked sttribute 
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Fig. 1: An illustration of the permutation paradigm 


By construction, Z has the same marginal distribution as X, which is an 
appealing property. Moreover, under a maximum-knowledge intruder model 
of disclosure risk evaluation, the small noise addition turns out to be irrel- 
evant: re-identification via record linkage can only come from permutation, 
as by construction noise addition cannot alter ranks. Reverse mapping thus 
establishes permutation as the overarching principle of data anonymization, 
allowing the functioning of any method to be viewed as the outcome of a 
permutation of the original data, independently of how the method operates. 
This functional equivalence leads to the following proposition: 


Proposition 1 For a dataset X(p,p) with n records and p attributes (X1,..., Xp), 
its anonymized version Yin») can always be written, regardless of the anonymiza- 
tion methods used, as: 


Yin) = (PUM igeened Xp oan) + Ego) (1) 


where P\,..,Py is a set of p permutation matrices and Ein») is a matrix of 
small noises. 


Proposition 1 is simply a restatement of the permutation paradigm. It 
has however several implications. The first is that it characterises permuta- 
tion matrix as an encompassing tool for data anonymization: the analytical 
framework of anonymization mechanisms can in fact be viewed as functionally 
equivalent to a set of permutation matrices. Permutation matrices are mean- 
ingful, readable and practical in comparison to the sometimes quite complex 
analytical apparatus of some masking methods. 

From each underlying permutation matrices, one can count, columns by 
columns of Pj, how many times the 1s have been moved, using the identity 
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Origins! dataset X Reverse mapped dstaset Z 
X, X, X; Zz, Z Z; 
13 135 3707 13 160 3707 
20 —2 826 20 52 2419 
2 123 -1317 2 123 -1008 
15 165 2419 15 136 826 
2 160 -1008 29 165 -1317 

Noise E Masked dstes et Yi=Z+E) 

E. E, E x Y¥2 Ys 
5 0 -459 8 160 3248 
0 5 1597 20 57 822 
3 0 1256 -1 122 248 
2 0 -229 18 135 597 
0 -1 610 29 164 -1927 


Fig. 2: Equivalence in anonymization: postmasking reverse mapping plus noise 
addition 


matrix as a starting point (which is a particular case of a permutation matrix 
with no permutation applied), then assigning a negative (resp. positive) sign 
if the 1 has been moved up (resp. down) and compute ranks displacement 
vectors (where each zero is replaced by an epsilon value). From the running 
example above, one gets: 


€ € € 
€ € 2 
ry=]elra= e [rg = 2 
€ 1 —2 
€ —4 —2 


Now, r; has to be evaluated in some way for assessing disclosure risk based 
on permutation distances. Bearing in mind that different user can have differ- 
ent views about disclosure risk (and thus about permutation distances), the 
following proposition establishes a measure of disclosure risk sensitive to dif- 
ferent aversions, with an adjustable degree of focus on permutation distances: 


Proposition 2 For any attribute j = 1,...,p of Yin), @ quantitative mea- 
sure of disclosure risk in the permutation paradigm is given by: 


n 1/a 
1 
Dj(a) = E ye rol for a<lAa # 0 
i=l 


and D;(a) = II Irj@” fora=, 
i=1 


(2) 
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where 7;(;) denotes the elements of r; and a the parameter of aversion to 


disclosure risk. 


D;(a) makes use of a power mean?([3]) for the aggregation of the com- 
ponents of r;, with the parameter a substantiating the notion of aversion 
to disclosure risk. The arithmetic mean becomes a special case ( a = 1 of 
D;(a), which forms a natural starting point by computing the average level 
of permutation distances. In that case, all distances are given the same weight 
and there is a one-to-one substitution between them, e.g. two records per- 
muted two ranks are equivalent to one record permuted four ranks. From this 
benchmark, the more a decreases, the more weight is given to the smallest 
permutation distances*. In fact, the more a approaches —oo, the more D;(a) 
converges towards the smallest permutation distance in a As a result, for 
a given r; and, we have D(a’) < D;(a): the lower is a, the stronger is the 
aversion to disclosure risk. 

Now, information loss can be assessed through a similar but also general 
approach, by considering the degree of similarity between the permutations 
that took place for the two attributes and allowing different weights for dif- 
ferent relative distances. To do so, it can be observed that a vector A(r,z) 
of differences between the vectors rj and rj is a vector of dissimilarity be- 
tween the anonymization procedures that have been applied to the couple of 
attributes k = (j, 7’) (with 7 4 7’). When each of the components of A(r;,) are 
equal to zero, j and j’ having been permuted the same way; the permutation 
matrices applied to them are identical, despite the fact that the anonymiza- 
tion methods used can be different in practice. There is no loss of information 
as the joint distribution of 7 and 7’ is preserved. But when A(r;,) has some 
non-zero elements information has been modified. This leads to the following 
proposition: 


Proposition 3 For two attribute j and j' of Y(n,p), @ quantitative measure 
of information loss in the permutation paradigm is given by: 


- 1/0 
1 
L40):= E Ss" area for @>1, (3) 
i=1 


where Ar) denotes the elements of A(rz) and @ the parameter of aversion 
to information loss. 


The measure J;(9) bears strong analytical similarities with Dj;(a), but 
while the latter is concerned about average or small permutation distances 
across records for a given attribute, the former considers average or large 
relative permutation distances between two attributes across records. 

? In linear algebra power mean is also the formula for the computation of p-norms. 


3 D;(0) is the geometric mean and Dj(—1) the harmonic mean. 
4 The limit case D; (oo) is strictly equal to the shortest permutation distance in r;. 
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The measures of Proposition 2 and 3 establish some universal measures 
of disclosure risk and information losses. They are universal in the sense that 
they can be applied on any dataset and any method, but also because they can 
account for the variety of preferences that occur in micro data transaction. 


Acknowledgement. The author wants to thank Josep Domingo-Ferrer and Krish Mu- 
ralidhar for useful discussions on a preliminary version of this paper. 
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1 Introduction 


The use of inferior vena cava (IVC) filters has become frequent among 
thrombotic patients suffering from additional complications. The IVC filters, 
which are surgically placed within the cava vein, are designed to prevent pul- 
monar embolisms by holding back any circulating blood clot until it can be 
dissolved by medication. There exist several different filters with different 
shapes, whose different effects and actual effectiveness are to date known only 
in a heuristic way or through statistical analysis performed on pulmonary em- 
bolism patients; detailed dynamical studies are available only for one specific 
filter model [1]. The purpose of the current research is to perform a prelimi- 
nary comparative study to assess how the presence of different filters affects 
the blood flow in the inferior vena cava. 


2 Methods 


We have obtained a real accurate 3D model of a portion of a patient’s cava 
vein; also, we have generated 3D models for some different representative filter 
geometries (see 1). We have used these models to simulate how these filters 
affect the blood flow within the vein. In particular, computer fluid dynamics 
studies have been performed for the inferior vena cava model either with or 
without filters within it. 


3 Results 


In all of the cases investigated it was found that the presence of a filter 
produces a significant increase of both wall shear stress levels and pressure 
drop. Significant differences among calculated values of blood velocity and 


* PhD advisor: Joan Herrero and Dolors Puigjaner 


8 Josep M. Lépez Besora 


Fig. 1: Geometrical models of the portion of cava vein (left) and four different 
filter geometries (right). For the sake of clarity a different scale was used for 
vein and filters. 


viscosity were found for different cases. Notwithstanding, different placements 
of the filter have minor effects on the blood dynamics. 


4 Discussion 


The present results suggest, in the context of a real patient, that strong 
WSS levels might provoke the detachment of the filter from the vein wall, 
which would result in fatal consequences. 


Acknowledgement. This work was supported by grant 2016PFR-URV-B2-66 from 
Universitat Rovira i Virgili. 
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1 Introduction 


Cerebral Palsy (CP) is one of the most frequent causes of disability in 
childhood, with an incidence of 2 per 1000 live births. In the EU, there are 
1.3 million out of 15 million people with CP in the world. This neurological 
disorder affects body movement, balance and posture and often is accompa- 
nied by cognitive or sensory impairments like mental retardation, deafness 
and vision problems. The severity of these problems varies widely, from very 
mild and subtle to very profound. But what most is affected by this disease, 
from the youngest age, is the ability to play. 

Play is probably the main activity for any child. Through play, children 
start to explore their world, and put the basis of their own system of values, 
which will be the cornerstone of their adult life. Play in children with CP 
becomes difficult due to the disability, and this in turn can affect child’s self- 
esteem. In addition, the sensory and motor problems experienced by children 
with CP affect how they interact with their surroundings, including the en- 
vironment and other people. Youth affected by CP have fewer opportunities 
to participate in traditional games and exercises such as playing basketball, 
riding a bike or playing ball with their friends. 

In this context, video games, and in particular exergames, represent a 
very promising way to enable youths with CP to perform the exercise they 
need to break the cycle of deconditioning, while allowing them to socialize 
with others in fun ways from the comfort of their homes. Exergames are a 
combination of exercise (or exertion) and video games. In particular, we refer 
to digital games that require actions of large body parts or the whole body 
to control gameplay. Reviews of exergames indicate that they have positive 
effects both on motivation for active participation in rehabilitation and on 
impaired functions. However, the design of these games can be challenging, if 
our goal is to help them to socialize with others. First of all, limitations in 
physical abilities of youth with CP make it difficult for them to play many of 
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the existing exergames. Second of all, there are challenges to social play such 
as establishing player groups and playing with players with different abilities 
that need special consideration. 


2 GAmification for a Better LifE 


GABLE? aims to create the first online video games service for youths 
with CP, which will be the hosting platform of games focused on improving 
motor skills and visual-motor coordination for youths with CP. These games 
will leverage the latest advances in Computer Vision techniques, in order to 
improve accessibility. The platform will be constructed with social networking 
in mind, which will allow parents, caregivers and patients to socialize in a 
common environment, to share experiences, in an effort to provide the best 
care for CP patients. Caregivers would also be able to share best practices and 
lessons learned among themselves, which will help them to provide a better 
service across Europe and beyond. 

GABLE was born out of the idea that there is little or no help for youths 
with CP to play games specifically suited for their disability, while motivating 
them to play more and helping to rehabilitate their motor and motor-visual 
skills, and at the same time, introducing them to multiplayer/online gaming 
that would improve their social skills, and by extension, their social inclusion 
among their peers. Looking at the research done into these problems, we have 
seen that only baby steps have been made in an attempt to solve them. Several 
isolated demonstrations, in a research environment, have shown that there is 
great potential in exergames to be used as rehabilitation tools, while others 
have shown that multiplayer can bring about social interaction leading to an 
improvement of social skills for disabled patients. 

What GABLE wants to do is to create the first online platform of games 
for patients with CP, where they, and their caregivers and parents, can have 
instant access to games, join a community of peers, and share their knowledge 
in order to provide the best possible help for all youth with CP. The platform 
will be built around the idea that online multiplayer games take advantage of 
the motivational aspects of group activity, and can provide additional motiva- 
tion compared to single player games. These games are inherently promising 
for people with disabilities that are confined to their homes or care centres, 
so they have the potential to reach the widest audience. 


3 Machine Learning in GABLE 


One of the important aspects of GABLE is to harvest useful information 
from raw data in order to help caregivers and patients. To this end, we are 
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designing a machine learning module, integrated in the backbone of the de- 
velopment platform and game-authoring tools, which mainly performs three 
different tasks including Statistical analysis, Predictive analysis and Recom- 
mender system. 


3.1 Statistical Analysis 


This module will provide basic analysis to the caregivers and parents, who 
will be able to track the progress of the patients using score of the games. 
The module will analyse all data collected, and will display statistical data on 
the types of games a patient has played, the progress she/he has made while 
playing a certain game repeatedly, etc. It will give an overall view of all the 
activity of the player, including game time, preference to a certain type of 
game, and will assist the caregiver in the active monitoring of a patient. 

From a statistical perspective, social inclusion and education are latent 
variables of the data that are related to the game score. For example, a game 
that is mainly played using the body motions teaches the patients to improve 
their motor functions. Consequently, the game score of a patient is improved 
during time by playing this game, we can imply that the motor function of the 
patient is improved as well. The second set of tools will conduct more advanced 
discovery techniques using data collected from all users of GABLE platform. 
In the above example, it is possible to create a multivariate regression model 
for predicting the score of a user in a particular game if she/he plays the game 
more often. The regression model in this example may consider “age”, “times 
played”, “gender”, “genre of game” in order to predict the game score. We will 
create various regression models by automatically selecting an appropriate set 
of independent variables to predict a dependent variable. Using this model, 
caregivers can predict the score of their patients in order to see if playing a 
particular game will be useful for their patients. 


3.2 Predictive Analysis 


This module will use the data gathered on the games played by the patients 
and their progress within a game in order to provide the caregivers recommen- 
dations on changing the difficulty level of a certain game, or recommend more 
challenging games. By analysing the data generated by the patients playing 
the game, the system can learn a regression or a classification model to predict 
the score of the patient in a certain scenario, and suggest to the caregiver in 
which configuration of a scenario the score of a particular patient will increase 
or decrease. With this information at hand, caregivers/parents will be able to 
change the configuration of the games in order to make them easier/more chal- 
lenging. We will utilize a probabilistic graphical model|1] to achieve this goal. 
To be more specific, we will create a Bayesian network to model the depen- 
dencies between different variables and factorize the joint probability density 
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function into smaller functions. There are two major advantageous with this 
model. First, it can deal with missing values by marginalizing the probabilistic 
model over them. Second, given values of some of the parameters of a game, 
it can suggest the other values of the parameters such that the probability of 
winning the game is reduced/increased. This can be done by computing the 
most probable explanation of the unknown variables. It should be noted that 
each game might have a different scenario. That said, we must create a unique 
graphical model for each game taking into account the interaction between 
their parameters and characteristics of the patients. 


3.3 Recommender System 


This module will provide rating for each game, as a measure of how many 
patients are playing this game, or how high it was scored by the patients 
and other caregivers and parents. Using this rating system, the module can 
then recommend a certain game for a patient, or for the caregiver/parent. 
We will follow the collaborative filtering [2][3] approach for this purpose. Tak- 
ing into account the fact that collaborative filtering mainly works with user 
preferences, we will explicitly collect information about taste of each user by 
asking them to rate the games and analysing their search queries. We may 
also implicitly estimate taste of each user by computing the amount of time 
that they have played a particular game. In addition, we may also consider a 
hybrid recommendation system by storing some information about each game 
such as genre, type of game, graphical information (2D/3D). 


Acknowledgement. This project has received funding from the European Union’s 
Horizon 2020 research and innovation programme under grant agreement No.732363. 
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1 Introduction 


Our own interest in deprojection angles stems from the fact that we started 
a quantitative study of the properties of spiral structure in near-by disc galax- 
ies, and for this we first need to deproject all our images. Indeed the list of all 
studies for which it is necessary to know the spatial orientation of the galaxy 
is too long to include here. 

We will study the spiral structure, in disc galaxies, decomposing each image 
by means of bidimensional Fourier transforms. The first step is to deproject the 
galaxy image. It is thus necessary to determine the two deprojection angles, 
namely the position angle (hereafter PA) and the inclination angle (hereafter 
IA). The PA is the angle between the line of nodes of the projected image and 
the north, mesured towards the east, while the IA is the angle between the 
perpendicular to the plane of the galaxy and the line of sight. 

Several methods have been proposed so far to obtain these angles. All of 
them suffer from some kind of systematic errors. The two methods we present 
can obtain very accurate values of the deprojection angles and also perform 
well for very low resolutions. This indicates that our methods can be used for 
very distant galaxies. 


2 Deprojection methods 


We introduce two methods, based on the Fourier transforms and which are 
closely linked to the two methods used by Garcia-Gomez and Athanassoula in 
[10] for HII region distribution in 1991. Let I(u,@) be the image of the galaxy 
written in polar coordenates (r,@), and u = In(r). We define the Fourier 
transform of this image as: 
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Umax 20 : 
Acie i fl T(u, 0&9) agdy (1) 


In this equation, p corresponds to the radial frequency and m to the az- 
imuthal frequency. Thus the m = 1 values correspond to one-armed compo- 
nents, the m = 2 values to two-armed components and so on. The values of 
Umin = INn(rmin) and Umar = IN(Tmax) are set by the inner and outer radius 
of the part of the image that we will analyze. Fixing the value of m, we can 
calculate the power associated to this compoment simply as: 


Pm = |A(p,m)| = 


[O° 40. m)d] (2) 


Pmax 


The pmaz value is related to the resolution in Fourier space through pmaz = 


sAG = Ce — —. where N is the number of points used in the Fourier 
transform in the eal dimension, usually N = 256 or 512. In our first method 


we try to minimize the effect of the spiral structure by minimizing the ratio: 


Pi stron to 6 
eget Bas he Pde 


BAG1= 


This is equivalent to maximizing the contribution of the axisymmetric com- 
ponent. Since a badly deprojected galaxy will look oval, and thus contribute 
to the m = 2 components as a bar, for our second method we simply minimize 
the ratio 
BAG? = 2 
Po 
We tested them using artificially generated, yet realistic, galaxies described 


by an exponential disk with spiral components 
I(r0) = e7/% 4 Ae'—"0/%) eS") cos(p In(r) + 26) 


We obtain very accurate values of the deprojection angles for a variety of 
situations; from inclinations greater of 80° to face on galaxies. One goal of 
the test is also the good performance of our methods in the case of very low 
resolutions, this indicates that our methods can be applied to very distant 
galaxies for cosmological interest. 


3 Deprojection of the galaxies 


We applied the two methods to the Frei sample galaxies as follows. First 
we constructed a grid covering all the possible range of values of PA and IA 
in increments of 2°. For each pair of angles (PA,IA) we deproject the galaxy 
image and we compute the Fourier transform (1) with the help of a polar grid. 
Using Eq. (2) we then calculate the power in each component and then the 
value of the ratios BAG1 and BAG2. We repeat this for every (PA,IA) pair 
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in the grid. The optimum values are those for which we have a minimum. We 
illustrate the use of our methods with the help of galaxy NGC4501. 

Finally, we perform correlations for comparing with other methods in the 
literature. Our two methods give mean correlation coefficients with the rest 
of the methods of 0.89 for BAG1 and 0.9 for BAG2 in the case of PA and 
of 0.87 and 0.88 for the BAG1 and BAG2 method respectively in the case 
of IA. This indicates that our methods are wel suited for the derivation of 
the deprojecton angles. In general, we can conclude that all the methods for 
deriving the deprojection angles are well suited from a statistical point of 
view. 


NGC 4501 (£) 


PA= 141 IA=60 


Fig. 1: Image of NGC4501 before and after deprojection 
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1 Introduction 


Reducing environment pollution and achieving greater sustainability into 
urban mobility are two of the major challenges that big cities confront in the 
21st century. Promoting a rational use of vehicles, such as vehicle sharing 
incentives or electric vehicles use, are just some of the current strategies. On 
this basis, many cities have started establishing the so-called Low Emission 
Zones (LEZ), which are zones where a number of restrictions and penalties 
are applied to their users. These measures are aimed at reducing the traffic of 
combustion engine vehicles and encouraging the use of less polluting and low 
emission ones, preferably electric vehicles. 

Although these strategies have proven to be effective in large cities, on a 
practical level, their implementation is neither simple nor economical. One of 
the main technological challenges regarding the LEZ scheme is to design a 
secure and reliable system which automatically controls the access of vehicles 
to these areas. Privacy also arises important challenges to the field and reveals 
that alternative user detection systems should be proposed instead of the use 
of video cameras that record all the vehicles plates that access the LEZs. 

Our general objective is to provide secure protocols that automatically 
control the vehicle accesses to LEZ, but preserving the privacy of the drivers 
as long as they behave honestly. 


2 Related Work 


In recent years, several LEZ access control approaches, known as Elec- 
tronic Road Pricing systems (ERP), on the basis of privacy by design have 
been proposed [1, 2, 3, 4, 5, 6, 7, 8]. All these systems require the use of an 
On-Board Unit (OBU) fitted with a GPS and a wireless communication sys- 
tem. The price of the fare is calculated according to the route the vehicle has 
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traveled. On the one hand, in [1] and [2], the information related to the ex- 
ternal server is sent by the OBU to the external server, owned by the Service 
Provider (SP), which is in charge of setting the prices in each billing period. 
On the other hand, in [3, 4, 5, 6] it is the OBU which calculates the fees and 
sends them to the SP server in each billing period. In that way, the revealed 
information relating to the location of the vehicle is minimal. These systems 
use cryptographic evidences along with physical random-located checkpoints 
to demonstrate that the OBU has been honest when calculating the amounts 
corresponding to the traveled routes. The work in [7] presents a user privacy 
preserving protocol based on a time approach which, unlike the aforemen- 
tioned works, offers a non-probabilistic fraud control. A further improvement 
of this protocol has been published in [8]. This proposal enhances the pricing 
system to dynamically adapt fares to the traffic changing conditions aiming 
at a better traffic distribution. Even when these protocols tackle the most 
important drawbacks of the systems proposed to date, due to their particu- 
larities, specific OBUs and full access to some of its functionalities are required 
for their feasibility. Nevertheless, OBUs integration in nowadays vehicles is 
not widespread and, as proprietary devices, most of their capabilities can be 
restricted to third parties. 


3 Model of the system 


Our general objective consists of encouraging the smartphone integration 
to the LEZ access control systems. The current anonymous approaches to 
control access to LEZs rely on the vehicles’ On Board Units (OBUs), never- 
theless, their integration in nowadays vehicles is not widespread. The adoption 
of the drivers’ smartphone for this purpose may ease the rollout and accep- 
tance of these zones. In any case, privacy is a mandatory issue and should be 
preserved as long as the drivers do not try to commit fraud. Only when a user 
accesses the LEZ without the proper authorization she should be identified 
and her anonymity revoked. 

The scheme we propose in [9] presents a lightweight ERP solution that con- 
trols the access to a LEZ in a secure and reliable way, while providing privacy 
to honest users. In contrast to other systems, our approach uses the drivers’ 
smartphone to validate their access instead of relying on an OBU. Those users 
who access the LEZ without proper authorization are automatically identi- 
fied for their subsequent sanction. Accordingly, all anti-fraud measures do not 
affect the privacy of honest drivers. 

The lifecycle of our system is divided into eight phases: i) Registration; ii) 
Installation; iii) Vehicle Registration; iv) Access; v) Exit; vi) Payment; vii) 
Fraud Control and; viii) Privacy Configuration. 

Before a user could start using the proposed access model, she should com- 
plete the Registration (i), Installation (ii) and Vehicle Registration phases (iii); 
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Fig. 1: Access infrastructure. 


where she registers her personal data to the Competent Administration of the 
LEZ (CALEZ), installs the mobile application into her smartphone and reg- 
isters the vehicles she will use to access the LEZ, respectively. Figure 1 shows 
a general scheme of the LEZ Access (iv) and Exit (v) phases. Both phases 
are presented together as they perform the same operations. When the vehicle 
approaches the LEZ access area, a Bluetooth Low Energy (BLE) tag awakens 
the application on the user’s smartphone. This process is automatically done 
without the intervention of the user. For its part, the Input Sensor notifies 
to the Access Control (AC) entity that a vehicle has entered the LEZ access 
area. The mobile phone application establishes a secure communication with 
the AC entity through a cryptographic protocol and proves that it is a valid 
user. During the process, the user’s anonymity is preserved through the use 
of a pseudonym. Then, the AC verifies whether the user’s access permissions 
are correct or not. Moreover, the access and exit points are equipped with 
several sensors to obtain the vehicle’s profile (height and length). If the user’s 
credentials are valid, the access is registered and the user can privately ac- 
cess the LEZ. This access information will be used during the Payment phase 
(vi) to calculate the fee the user has to pay. Conversely, if the user does not 
have valid access permissions, the AC will take a photo of the vehicle license 
plate. With this photo the system will be able to identify the offending user. 
Additional anti-fraud measures are performed in Fraud Control phase (vii), 
where an independent entity looks for inconsistent patterns in the registered 
accesses and exits. Finally, to avoid that all the registered accesses of a user 
could be bind together though her pseudonym, a user can ask for a new one 
by running the protocol defined on the Privacy configuration phase (viii). 
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1 Introduction 


Diabetic Retinopathy (DR) is a leading disabling chronic disease and one 
of the main causes of blindness and visual impairment in developed countries 
for diabetic patients. Studies reported that 90% of the cases can be prevented 
through early detection and treatment. Eye screening through retinal images 
is used by physicians to detect the lesions related with this disease. Due to the 
increasing number of diabetic people, the amount of images to be manually 
analyzed is becoming unaffordable. Moreover, training new personnel for this 
type of image-based diagnosis is long, because it requires to acquire expertise 
by daily practice. 

Deep Learning is a set of Machine Learning techniques for automatically 
constructing a model using multiple levels of representation from the under- 
lying distribution of a large set of examples, with the final objective of map- 
ping a high-multidimensional input into a smaller multidimensional output 
(f: R° 4 R™,n > m). This mapping allows the classification of multidimen- 
sional objects into a small number of categories. The model is composed by 
many neurons that are organized in layers and blocks of layers, using a cas- 
cade of layers in a hierarchical way. Every neuron receives the input from a 
predefined set of neurons. Every connection has a parameter that corresponds 
to the weight of the connection. The function of every neuron is to make a 
transformation of the received inputs into a calculated output value. For every 
incoming connection, the weight is multiplied by the input value received by 
the neuron and the aggregated value that used by an activation function that 
calculates the output of the neuron. The parameters are usually optimized 
using a stochastic gradient descent algorithm that minimizes a predefined loss 
function. The parameters of the network are updated after backpropagating 
the loss function gradients through the network. These hierarchical models 
are able to learn multiple levels of representation that correspond to different 
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levels of abstraction, which enables the representation of complex concepts in 
a compressed way [6], [8], [2], [1]. 

Quadratic Weighted Kappa (QWK) index is used in many medical diagno- 
sis systems because the diseases have different degrees of severity, which are 
naturally ordered from mild to the most critical cases. If the diagnose is based 
on image analysis, the classification is even more difficult because in the inter- 
pretation of the image data normally is present some level of subjectivity that 
sometimes makes the conclusions of different experts to differ [5]. Quadratic 
Weighted Kappa is able to measure the level of discrepancy of a set of di- 
agnosis made by different raters over the same population [9]. The strength 
of agreement between the raters is evaluated as a function of the distance 
between the prediction of both raters. For the case of diabetic retinopathy 
detection, human expert raters report inter-rater values of QWK about 0.80. 
This index has been used to evaluate the performance of the predictive model, 
in comparison with the human experts level. 

The work done up to know in the thesis has been centered mainly in two 
studies. First, the construction of a classifier of diabetic retinopathy severity 
using the information encoded in the images of patient’s retina using deep 
neural networks [4]. Second, the improvement of the classification quality using 
a learning approach based on ordinal information of the QWK [3]. 


2 Diabetic retinopathy detection using deep neural networks 


The traditional model of pattern recognition has been based on extracting 
hand-crafted fixed engineered features or fixed kernels from the image and 
using a trainable classifier on top of those features to get the final classification. 
Using this scheme the problem of the DR detection has been based on hand 
engineering the features for the detection of microaneurism, haemorrhages and 
exhudate in retinal images that maximize the performance of the classifier. 
This type of approach requires a good understanding of the mechanism of the 
disease, requires a lot of labor time and is very task-specific and thereby not 
reusable for other different classification problems. 

In this first work of the thesis we explore a completely different approach 
consisting on automatic feature learning. We use a deep convolutional neural 
network model for predicting the probability of every one of the five stan- 
dardised DR severity levels [4]. The model is trained using a logarithmic loss 
function and stochastic gradient descend optimization based algorithms and 
a set of data augmentation techniques. The training procedure details can 
be found in [4]. The study was done with the Kaggle dataset of EyePACS. 
This image set has about 88.000 retina images, which are labeled by expert 
physicians. 

To improve the classification rate, we use a probabilistic combination of 
the information that can be obtained from both eyes of the same patient. DR 
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usually affects both eyes, specially when the illness is in high severity stages. 
The dataset used is big enough to infer from the frequencies of co-occurrence 
of the classes, the conditional probabilities of having one class in one eye given 
another class in the other, P(Left|Right) and P(Right|Left). Being P(Le ft) 
and P( Right), the probability distributions obtained by our predictive model 
with the left image and the right image, respectively, we can estimate Pr, and 
Pr using Pp = P(Left|Right)P(Right) and Pr = P(Right|Left)P(Left). 
To merge the value obtained from the model with the estimation coming from 
the other eye, we calculate the arithmetic mean. The class with maximum 
value is the one selected for each eye. 


3 Improving the classification rate with QWK loss function 


The optimization of the neural networks for multi-class classification is 
traditionally done using the logarithmic loss. The logarithmic loss has a very 
robust probabilistic foundation: minimizing it, is the same as minimizing the 
logarithmic likelihood, that is equivalent to do a Maximum Likelihood Esti- 
mation (MLE) or equivalently, to find the Maximum a Posteriori Probabil- 
ity (MAP), given a uniform prior [7]. This loss function is designed to find 
perpendicular vectors in the output space. This model is suitable when the 
output classes are independent, but it may not be good in cases where classes 
are ordered. This is the case of some disease prediction, where an incremental 
severity scale is present. Normally in those cases a ordinal regression approach 
is better. 

As Quadratic Weighted Kappa index is designed to evaluate a good ordinal 
rating, we explored the possibility of substituting the log-loss function by 
the QWK-loss function in deep neural networks training [3]. We defined the 
optimization procedure in terms of QWK and we showed that, for DR severity 
prediction, classification improves in more than a 5%. This method is directly 
generalizable to other multi-class classification problems where there is a prior 
known information about the predefined ordering of the classes. 


4 Conclusions and future work 


With the work done up to know we have been able to model the diabetic 
retinopathy detection using supervised deep learning techniques. Using the 
new QWkK-loss function, we obtained up to a 5% increase in the classification 
rates over the standard approach. Moreover, thanks to the probabilistic com- 
bination of the results of both eyes we have been able to increase even further 
the results of the model, being able to reach human expert level performance. 

The results of our study show that with the direct optimization of the 
QWK index allow the consecution of better generalization results in different 
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datasets with ordered output classes. Log-loss has to learn the predefined or- 
dering of the classes from data and this seems to be a disadvantage. Results 
showed that, depending on the use case, between 6-10% of improvement can 
be obtained from the direct optimization of QWK. This is a significant im- 
provement that may be worth specially in medical diagnosis, since an accurate 
detection of the level of severity of a disease usually has great influence on the 
treatment prescription and the possibility of minimizing bad consequences of 
the illness. 

Future work will be centered on the finding an human-understandable in- 
terpretation of the results given by the model and in the usage of unsupervised 
learning techniques to make the classification. 
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1 Introduction 


Sentiment Analysis (SA), also known as Opinion Mining, is the task of au- 
tomatically identifying the opinions of customers about a product (or service, 
social event, etc.) from customer textual comments crawled from various social 
media resources. It normally involves the classification of text into categories 
such as “positive”, “negative” and “neutral”. SA can be done at different 
levels. Coarse-grained analysis attempt to extract the overall polarity on a 
document or sentence level, whereas, in a fine-grained level of analysis, the 
problem is to identify the sentiment polarity towards a certain target in a 
given text (Target-dependent sentiment analysis) [5, 9]. 

Most of the existing systems of SA and Targeted SA are inspired in the 
work presented in [7]. Machine learning techniques have been used to build 
a Classifier from a set of tweets with manually annotated sentiment polarity. 
The success of the machine learning models is based on two main facts: a large 
amount of labeled data and the intelligent design of a set of features that can 
distinguish between the samples. In this approach most studies have focused 
on designing a set of efficient features to obtain a good classification perfor- 
mance. For instance, the authors in [6] used diverse sentiment lexicons and a 
variety of hand-crafted features. To leverage massive sets of tweets contain- 
ing positive and negative emoticons for automatically learning the relevant 
features, several deep learning models have been proposed. For example, the 
work presented in [8] trained a convolutional neural network (CNN) to learn 
the best features and used it to classify the sentiment of the tweets. 

Following those approaches we have developed systems to perform the sen- 
timent analysis of tweets on two levels: coarse-grained and fine-grained. Based 
on the employed approach we categorise the systems into two groups: Machine 
learning models and deep learning models. The next subsections explain briefly 
those systems. 
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2 Machine Learning Models for SA 


Within this approach, we have developed three systems: SentiRich [1], En- 
SITAKA and Ar-SITAKA [2]. All these models are based on Support Vector 
Machines (SVMs). The basic steps of SentiRich are the following: 

Preprocessing: in the pre-processing stage URLs and user mentions are 
removed, and all the text of the tweet is converted to lower case. After that, 
it is tokenized and POS-tagged using the tool Ark Tweet NLP. The suffix 
“ NEG” is added to all the words that appear in a negated context, which is 
a segment of a tweet which starts with a negation (e.g. no, don’t) and ends 
with a punctuation mark. 

Feature Extraction: the polarity of a tweet (positive, negative or neutral) 
is determined by SentiRich, which is fed with the following features: 


e Basic tert features: n-grams (contiguous sequences of n tokens, with n 
from to 1 to 4) and negated n-grams (the same information, but only with 
the tokens that appear in negated contexts). 

Syntactic features: number of occurrences of each POS and bi-tagged fea- 
tures (combination of bi-grams with their POS tags). 

Lexicon features: it includes the estimation of the polarity of the tweet 
according to seven popular opinion lexicons. The information about the 
positive/negative polarity of each word is combined, as described in [1], 
to obtain a global polarity of the tweet for each lexicon. Other lexicon- 
dependent features in this category include the average polarity of the 
positive/negative terms, the score of the last positive/negative term, and 
the maximum/minimum positive/negative score. 

Semantic features: each word of the tweet is mapped to a predefined clus- 
ter that groups together words that have similar meanings. Two sets of 
semantic clusters were used: the 1000 ones defined in the Ark Tweet NLP 
tool and the 4960 n-gram clusters obtained with the Word2vec tool. 


Classification: SentiRich determines the polarity (positive, negative or 
neutral) at the tweet level. It is a classifier based on a SVM. This classifier was 
trained using the Twitter2013 train and development sets from SemEval20138, 
a well-known worldwide competition of natural language processing systems 
based on semantic analysis. The accuracy of the classifier was evaluated by 
comparing its performance with that of the top systems in the last SemEval 
competitions, using different data sets from these events. These results showed 
that the system obtained, in most of the cases, levels of accuracy that outper- 
formed those of the state-of-the-art sentiment analysis systems (around 68% 
and 72%, depending on the input set). 

En-SITAKA and Ar-SITAKA are extended versions of SentiRich where 
extra types of features have been added. 


e Embedding features: Word embeddings are an approach for distributional 
semantics which represents words as vectors of real numbers. We used 
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sum, standard-deviation, min and max pooling functions to obtain the 
tweet representation in the embedding space. 


En-SITAKA was trained using the training sets of English-language tweets 
provided in SemEval13-16 whereas Ar-SITAKA was trained using the Arabic- 
language tweets provided by the organizers of SemEvall7. The systems have 
been tested on 12,284 English-language tweets and 6100 Arabic-language 
tweets provided also by the organizers of SemEval2017. The golden answers of 
all test tweets were omitted by the organizers. En-SITAKA ranks 8th among 
38 systems and Ar-SITAKA ranks 2nd among 8 systems in the SemEval2017 
competition. 

As a case study, SentiRihc has been used to analyse 3,000 tweets by local 
residents and 3,000 tweets by tourists at 10 major destinations in Europe with 
the aim of finding out the positivity, neutrality or negativity of their published 
tweets [4]. 


3 Deep Learning Models for SA 


Deep learning techniques for SA have become very popular. They pro- 
vide automatic feature extraction and both richer representation capabilities 
and better performance than traditional feature based techniques (i.e., hand- 
crafted features). We have leveraged Recurrent Neural Networks (RNNs) to 
solve the targeted SA problem. 

We have developed a system called target-dependent bidirectional gated 
recurrent unit (TD-biGRU) [3]. It has the ability to represent the interaction 
between the target (an entity, like a person, organisation, product, object, etc., 
referred to in a text, about which an opinion is expressed) and its context (the 
text surrounding it). Its main steps are the following. First, the words of the 
input sentence are mapped to vectors of real numbers. This step is called vector 
representation of words or word embedding. Afterwards, the input sentence is 
represented by a real-valued vector using a bidirectional gated recurrent unit. 
This vector summarizes the input sentence and contains semantic, syntactic 
and/or sentimental information based on the word vectors. Finally, this vector 
is passed through a softmax classifier to classify the sentence into positive, 
negative or neutral. 

We have evaluated the effectiveness of the proposed model on a benchmark 
dataset from Twitter. The experiments show that TD-biGRU outperforms the 
state-of-the-art methods for target-dependent sentiment analysis. 
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1 Prefetching Overview 


Cache holds very few data, so it is compulsory to decide which ones are 
best suited to be stored in. Their goal, is to increment the probability to find 
them in cache and avoid the main memory access. We have to decide when— 
how-where to copy a data into cache, when—how-where to look for data in 
cache and when-how-where to dismiss data from cache. This means hardware 
algorithms to place, look—for and replace. 

Observing typical program pattern access, we can see a property of data 
locality. That means that after accessing a data it is very probable to access 
the same data, or a data near that one, in the near future. There are two types 
of locality: temporal locality (the same data) or spatial locality (data next to 
the accessed one). Caches designs wants to take profit from this property. 

In order to take profit of spatial locality instead of bringing only one word 
(4, 8 bytes) caches brings a full block (16 to 128 bytes). This is a basic prefetch 
mechanism. This is very useful for scientific programs that have sequential 
accesses to vector and matrix, but if the block size is increased it can increase 
pollution. 

Caches are initially empty. So at the beginning some accesses will produce 
a miss (also known as compulsory miss). This behaviour leads to a research 
challenge that tries to reduce the compulsory misses by bringing blocks to 
cache before the processor ask for them. This is the target of all prefetching 
mechanism. 

Prefetching can be applied at any level of the memory hierarchy and from 
a software or hardware point of view. At the present moment, most of the 
processors are multicore/multithread and in this scenario caches can be shared 
between cores and threads. Several problems arise with prefetching in all those 
alternatives. 
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2 Prefetching Challenges 


With the idea of providing a good performance in terms of execution time, 
many prefetching|4] mechanism have been proposed for unicore processors 
with the main challenge to correctly choose the next useful block and the cor- 
rect timing. Multicore prefetching[5] has also two main focus of prefetching, 
one is to bring data as near as possible to the core that would use it (privates 
caches) and the other is to bring data to the main shared cache of the pro- 
cessor from main memory. The first one will have to deal with the coherence 
cache protocol and the second one will have to deal with the shared resources 
of the cores (network, external connections,...). Many|7][8] uses the extra com- 
putation of multicores/multithreads to execute the prefetch mechanism but 
at the expense of not using them for raw computation. Different approaches 
have to deal with this dilemma and use complex prefetch alternatives that 
uses a lot of processors resources or simple prefetch alternatives that does not 
perform as good. 

One resource that influence memory latency is the network that connects 
the cores with the shared cache and the external main memory. A good 
prefetching algorithm should not produce an increase of traffic in the memory 
interconnection. An increase in memory traffic entails higher power consump- 
tion and a higher degree of contention in the interconnection network. Note 
that the number of memory requests are in most of the cases going to be 
higher than in a system without prefetching. This is especially true if we are 
in a multicore system because it increases on-chip communication since coher- 
ence between the L1 caches of the tiled CMP must be ensured. An increase in 
the congestion of the network will most probably increase the latency of not 
only prefetching requests but also regular memory operations 

Our proposal focus on network congestion in order to dynamically reorder 
data access to prioritize regular request in detrimental of prefetch request to 
avoid increments in latency of regular data access. 


3 Prioritization into the network: Our proposal 


Traditionally, memory systems do not differentiate between prefetch and 
regular requests. Recently a number of approaches[9][10][11] have appeared 
that give several priorities to both types of requests depending on the pre- 
dicted behaviour, since it has been shown that delaying regular requests may 
degrade performance if prefetch requests are not accurate. 

The prefetcher mechanism send requests to the network to bring data from 
external memory to internal caches or to move data between the internal 
caches. That requests have to coexist with regular data accesses and travel 
together within the network subsystem. One challenge is that prefetching does 
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not have to penalize regular accesses but the network subsystem does not have 
information about the different types of requests. 

In our proposal the prefetcher mechanism will add information to its data 
access in order to mark them as a prefetch requests. Also it will be add 
information related to the timing of the prefetch. Network routers will use that 
information to reorder requests in its queues and apply/modify the priority 
of them. Moreover, it will use dynamic information of congestion in order to 
apply different policies. regular accesses have maximum priority and prefetch 
accesses have variable priority depending on its time request. 

Another possibility of the mechanism is to discard prefetch accesses when 
they are arriving after its time request, and also discard some of them when 
the network utilization is near full capacity. 

Several experiments will determine optimal values of priorities and thresh- 
olds needed by the mechanism in order to modify priorities and discard some 
or all the prefetch access. 


4 Experimental Framework 


The SimpleScalar Tool Set/1}provide simulators ranging from a fast func- 
tional simulator to a detailed out-of-order issue processor that supports non- 
blocking caches, speculative execution, and state-of-the art branch prediction. 
In this work, we use the sim-outorder to obtain program statistics. 

The Standard Performance Evaluation Corporation (SPEC)|2] is an or- 
ganization founded in 1988 that provides several families of benchmarks to 
measure the performance of different computer systems. In this work we con- 
sider the CPU family, designed to provide performance measurements that 
can be used to compare compute-intensive workloads on different computer 
systems. In particular, we consider the following suite of the CPU family: 
SPEC CPU2006. 

The Opnet Modeler Suite[3] provides a suite of protocols and technologies 
to design, model, and analyze communication networks. 


5 Conclusions and Future Work 


Prefetching in multicore processors shows new challenge designs that have 
to deal with sharing the available resources of the processor for demand re- 
quests and prefetch requests. We focus on network congestion as a measure 
to reorder, depriorize and also discard prefetch requests. In this way, tuning 
the correct values of our mechanism will reduce global average access time of 
memory accesses. 
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1 Introduction 


Cost-effective third-party (cloud) service providers offer very convenient 
data storage and computation services at a low cost, thus providing an at- 
tractive alternative to other forms of storage. However, outsourcing potentially 
sensitive datasets to an external cloud service provider poses many security 
and privacy concerns. 

A natural approach to address these security concerns consists in apply- 
ing cryptographic techniques. However, traditional symmetric-key encryption 
techniques fail to provide an efficient solution. A trivial option consists on 
encrypting all data using a symmetric-key encryption scheme and using the 
server only for Storage-as-a-Service. To perform a computation, all relevant 
data is retrieved, decrypted and computed on locally. Unfortunately, this so- 
lution may not be efficient, particularly if client devices have limited com- 
putational power or storage capacity, and requires a high bandwidth during 
queries. Alternative cryptographic schemes must be developed in order to 
overcome this obstacle. 

In recent years there have been important advances in cryptographic tech- 
niques that allow to take advantage of the economical and functional benefits 
of cloud computing while securing the data. Two of these techniques are Ho- 
momorphic Encryption (HE) and Secure Multi-party Computation (SMC), 
both of which allow for remote computations over encrypted data. 

Secure Multi-party Computation protocols are interactive protocols that 
allow a set of parties to jointly compute a function over their inputs. In SMC, 
parties keep their inputs private and engage in an interactive protocol with 
each other, so that at the end of the protocol each party learns the function 
evaluation and nothing else about inputs from other parties. 

Homomorphic Encryption schemes allow computations to be performed 
directly on encrypted data, and they are classified according to the opera- 
tions they support. Additive HE (such as [6]) and multiplicative HE schemes 
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efficiently support a single operation on ciphertexts, that is, addition and mul- 
tiplication respectively. Somewhat Homomorphic Encryption (SHE) schemes 
support any number of additions, but a limited number of multiplications. 
Fully Homomorphic Encryption (FHE) schemes support an arbitrary number 
of additions and multiplications on ciphertexts. Unfortunately, all known FHE 
schemes are computationally very expensive, which hinders their applicability 
in practice. 


2 Secure Interpolation in the Cloud 


Interpolation and regression techniques such as generalized least squares, 
polynomial regression and Spline interpolation are often used in practical 
applications, for example in order to predict values of some phenomena given 
a set of samples. They have a vast amount of applications, ranging from 
computer graphics to data analysis or experiment designs. 

Outsourcing such computations to the cloud can offer numerous cost-saving 
and practical benefits, since applications often involve massive datasets and 
expensive computations. Data ubiquity is also very convenient, as such com- 
putations can involve data owned by multiple organizations, or they can be 
requested by multiple parties. 

However, outsourcing computations to the cloud can pose security and pri- 
vacy concerns, since applications usually involve potentially sensitive datasets. 
Therefore, we aim to provide practical solutions to enable clients to efficiently 
delegate an encrypted dataset to a semi-trusted server, in such a way that 
interpolation computations can be performed directly over encrypted data. 

Following the previous discussion, we may look for a solution involving HE 
schemes. The main obstacle to applying this approach is that the considered 
computations often involve complex operations, requiring many additions and 
products. Some interpolation techniques involve computations that are cur- 
rently challenging even when using FHE, including the computation of square 
roots, natural exponentiations or solving systems of linear equations. In order 
to overcome this obstacle, we look for tailored adaptations of the interpolation 
computations, so that we can apply HE schemes and enable the delegation of 
interpolation computations to the cloud. 


3 Private Outsourced Kriging Interpolation 


Kriging [1, 3, 5, 8] is a well-recognized form of linear interpolation widely 
used with datasets involving spatially correlated data. It aims at predicting 
the value of some phenomena at an unobserved location in a two-dimensional 
region. This interpolation method was designed with geo-statistical applica- 
tions in mind (e.g. to predict the best location to mine within a region, based 
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on the mineral deposits found at previous boreholes), but has also found appli- 
cations in a variety of settings including remote sensing, real-estate appraisal 
and computer simulations. Kriging has been identified as a good candidate 
process to be outsourced to the cloud, based on the practical and legislative 
requirements of industrial users [2, 4]. 

Based on a recent work carried out in conjunction with James Alderman, 
Benjamin Curtis, Oriol Farras and Keith M. Martin, we present a method 
for the efficient private outsourcing of Kriging interpolation. The proposed 
solution uses a tailored modification of the Kriging algorithm in combination 
with additively homomorphic encryption, allowing crucial information relating 
to measurement values to be hidden from the cloud service provider. Moreover, 
with the exception of the high one-time cost of encrypting the dataset, the 
remaining client-side processes are very efficient. We evaluate the performance 
of our solution through an implementation in Python 3.4.3, using the PHE 
library [7]. 

Since the approach followed for Kriging interpolation is applicable to other 
interpolation techniques and statistical tools, a next step in this line of work 
is to develop solutions for other similar techniques. 

The proposed results have been presented at the 5th Workshop on En- 
crypted Computing and Applied Homomorphic Cryptography (WAHC’17). 
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1 Introduction 


Classification of food related places is one of the most recent promising 
application in the area of scene recognition. It helps to analyze the nutrition 
intake based on the food related activity. Lifelogging is used to capture and 
analyze the data sources to record the events and patterns of a person’s life 
by using wearable sensors, such as wearable cameras. In visual lifelogging 
images are captured by wearing a camera over a long period of time which 
shows the daily experience of the camera user. We aim to work in the area of 
food places recognition by analyzing the images, which were captured by the 
visual lifelogging. In this work, a computer vision based food related places 
recognition method will be introduced. Finally, we will develop and implement 
a fully automated food profiling system by analyzing the image of food related 
places. Currently, we used a deep learning based approach for classifying the 
food related places, which has shown promising results. 


2 Proposed Approach 


2.1 Visual Lifelogging Dataset 


Lifelogging is a procedure in which personal data produced by human 
behavioral activities is tracked and recorded. It tracks personal activity data, 
such as exercising, sleeping and eating. For collecting visual lifelogging images, 
we used a wearable camera named “Narrative clip” [1] for developing our visual 
lifelogging dataset. Figure 1 shows the narrative clip camera. 

This camera is able to collect big amount of images with respect to its con- 
tinuous image collection capability (2-3 per minute and 1500 per day, 70000 
per year). After gathering all of the images, we have labeled them with re- 
spect to their places class name (e.g., restaurant, supermarket, kitchen, etc.), 
and named it “Egoplaces”. Egoplaces contains thirty three thousand images 
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Fig. 1: Narrative clip camera 


comprising of 28 food related places in total. The details about the dataset is 
shown in Table 1. 


Table 1: Description of the “Egoplaces” dataset. 


Class name _|No. of images Class name No. of images Class name No. of images} Clsss Name _|No. of images 
bakery_shop 1038 cafeteria 1476 food_court 161 pizzeria 1005 
balcony interior 527 candy-store 154 greenhouse_indoor 55 pub_indoor 182 
banquet _hall 206 cocktail 526 ice_cream_parlor 86 restaurant 4001 
bar 1362 coffee_shop 1607 kitchen 4058 restaurant_patio 63 
bazaar_indoor 217 delicatessen 745 market_indoor 1027 supermarket 3141 
beer_hall 339 dining room 3681 market_outdoor 1813 sushi_bar 127 
butchers_shop 81 fastfood_restaurant 981 picnic_area 720 workplace_office 3654 


In this dataset, we have 28 classes of food related places, which are col- 
lected by egocentric camera. It contains rich classes that cover various visual 
surroundings of our daily life experience. 


2.2 Deep Learning 


Deep learning is a part of machine learning that has revolutionized the 
area of artificial intelligence. It is widely used in computer vision and natural 
language processing, yielding best outcome and outperforming most of the 
state-of-the-art approaches. Neural Networks (NNs) is one of the well-known 
learning system in the area of machine learning. In deep learning, convolu- 
tional neural network (CNNs) is mainly used for object detection and image 
classification or recognition. Deep learning is mainly based of artificial neural 
networks (ANNs), which is able to learn (also called training) from data and 
apply the learned knowledge to new data (called testing). The core concept of 
ANNs comes from the human brain functionality. Deep Learning has become 
very powerful because of the recent technological advancement in graphics 
processing units (GPUs) and central processing units(CPUs), and the big 
amount of data which is publicly available nowadays. In this work, we used 
state-of-the-art CNNs models to classify food related places. 

Currently, deep learning based models are also used in different mobile 
based applications such as, “Snap-n-Eat” [6] and “Lose it” [2] to detect and 
recognize different types of food items. However, no work has been done in 
the area of scene recognition related to the food places. A novel scene classi- 
fication method using CNNs deep features was proposed by Bolei Zhou et al. 
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[7] to compare the density and diversity of images using the “Places” dataset. 
However, this method is proposed to classify the scenes based on different 
places such as, airfield, art studio, bathroom, classroom, etc. 


3 Experimental Results 


In the initial stage of the experimental setup, the dataset was splitting for 
the training and test phase. The dataset was split into 80% for training and the 
rest of the images for testing. Here we utilized NVIDIA GTX 1070 with 8GB 
memory size which facilitated to run a complex network architecture. The 
scheme for deep learning is the latest version of Keras 2.0.3 with Tensorflow 
backend. In this work, we proposed a deep learning based food related places 
recognition system to classify different types of food places by using the state- 
of-the-art CNN models. We used transfer learning method, which is fine tuning 
of ImageNet pre-trained models because currently it has the best accuracy for 
image classification task. We have used three different models: VGG16 [4], 
ResNet50 [3] and InceptionV3 [5]. The classification results show that the 
InceptionV3 model yields the highest accuracy among the used state-of-the- 
art models. Figure 2 shows the InceptionV3 model architecture for our 28 food 
related places classification task. We modified it by removing the intermediate 
auxiliary logits output to adopted with our problem. 


Food related scene image Deep convolutional neural network(InceptionV3) Number of classes(28) 
e 
@ balcony_interior 
4 4 ®@ banquet_hall 
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Te Reed Reed eel eee ee eee | ‘ laf , je . 
Ta a a a ar ar a” \S0,9) \PdP ® 
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™ «Convolution = Dropout @ kitchen 
= AvgPool = Fully connected ‘@ supermarket 
» MaxPool = Softmax @ sushi_bar 
= Concat @ workplace_office 


Fig. 2: InceptionV3 model architecture for the food related scene classification. 


The number of classes are 1000 in ImageNet models classifier, but we have 
only 28 classes of food related places. So we have to tune the dimension of the 
last fully-connected (FC) layer of the networks with the number of 28 classes. 
We have tested the default model on Egoplaces and also retrained the final 
layer to get a model depending on the pre-trained model. We found that the 
default and the retrained last layer of CNNs models cannot produce perfor- 
mance higher than 40% and 60% respectively. To overcome this problem, we 
retrained the full models until the softmax layers of our pre-trained models 
and got much higher accuracy. It is necessary to fit and fine-tune the opti- 
mization parameters for example, weight decay, which avoids the over fitting 
problem and helps to provide the balance in between variance and bias. In the 
training process, we used s stochastic gradient descent with 0.9 momentum. 
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We used a learning rate of 0.001 as well as batch size 64 is also given to the 
network. The accuracy of the classification results for our models are listed in 
Table 2. 


Table 2: Comparison of accuracy among the models 


Deep models] Accuracy 
VGG16 52.22 
ResNet50 73.17 
Inception V3} 88.35 


4 Conclusion and Future Work 


In this work, we presented a method to classify food related places by 
using transfer learning of different CNNs models. In order to achieve a higher 
accuracy, we fine tuned all the network layers that increased the classification 
performance. The obtained results conclude that the InceptionV3 architecture 
is able to learn the features of food related places with a classification rate of 
88.35%. In future work, we will increase the umber of images in the proposed 
dataset (especially, the categories that have few number of images) and create 
our own deep models for developing fully automated food profiling system. 
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1 Purpose 


There is an unstoppable tendency towards personalized medicine in order 
to achieve both, diagnosis and treatment, and monitoring more effective for 
each patient. In this line, we propose the P-BreasTreat work, aimed at the 
personalized treatment of breast cancer by developing new computational 
techniques for image and data analysis. The ultimate purpose is to improve 
the effectiveness of current methods for determining the level of malignancy 
associated with that cancer tumors and also to propose models to prevent 
relapse and improve the quality of life of the patients. 


BOG — 
imo — 
Cows - 
Bom 


Fig. 1: 16 examples of input image samples, 4 for each of the 4 molecular 
subtypes of breast cancer 


In proposed work, we will develop computer technologies for distinction 
and initial screening of the 4 molecular subtypes of breast cancer shown in 
figure 1 (Luminal A, Luminal B, Her2+ and Triple Negative) as advanced 
support to the traditional pathologic analysis. The impact will be focused 
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on reducing both the number of biopsies and adverse psychological effects on 
patients. To do this, we will design specific methods of medical image analysis 
by using Computer Vision and Artificial Intelligence techniques, aimed at 
designing new adaptive biomarkers. 

Once detected the molecular subtype, we will design customized models 
for the diagnosis and monitoring of patients treated with neoadjuvant ther- 
apy, conservative surgery and radiotherapy, in order to provide new tools for 
predicting relapse (either local or remote) of breast cancer, anticipating correc- 
tive measures to improve the rate of recovery. These models will also highlight 
critical points of the treatment or disagreements with the clinical standards 
(analysis of adherence). In order to do this, we will apply automatic process 
mining techniques to the evolutionary data of the patients. 


2 Related Works 


Numerous approaches have been proposed to classify the BC tumor sub- 
types based on histological information. The method designed by Perou et al. 
[2] performed a BC classification into certain “intrinsic” subtypes based on 
gene expression patterns. Herbeck et al. [3] presented the guidelines for the 
BC molecular subtype categorization based on several immunohistochemistry 
(IHC) biomarkers such as estrogen receptors (ER), progesterone receptors 
(PR), human epidermal growth factor receptor 2 (HER-2) and antigen KI-67 
(Ixi67). 

Torrents-Barrena et al. [4] presented the first work to determine the feasi- 
bility of using a CAD system to differentiate among all BC molecular subtypes 
in mammograms. Authors designed two classification experiments: “Luminal 
A vs. Luminal B”, and “Luminal A vs. Luminal B vs. Her-2+ vs. Basal Like”. 
Support Vector Machines (SVM) and Local Binary Patterns (LBP) yielded 
the best accuracy: 75% and 52.17%, respectively. Moreover, they designed in 
[5] a new methodology based on fractal texture analysis and unsupervised / 
supervised classifiers. SVM also achieved the best performance (76.48% and 
55.67%, respectively). The main drawback of both works was the limited num- 
ber of Her-2+ and Basal Like samples. 


3 Proposed Method 


In this abstract, we propose a semi-automatic CAD system to classify the 
four molecular subtypes of BC from full-field digital mammograms (FFDM). A 
modified VGG16 [6] convolutional neural network architecture is presented to 
learn the underlying micro-texture patterns of the mammogram image pixels 
for each subtype. 

Our hypothesis is that a CNN conveniently designed can learn the pro- 
totypical underlying micro-textures of each cancer subtype and that those 
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Fig. 2: Work flow process of four molecular subtypes classification of breast 
cancer 


prototypes are characteristic of each subtype shown in figure 3, i.e., they are 
similar to all samples of the same subtype but different from the micro-texture 
prototypes of the other cancer subtypes. Hence, the trained CNN should be 
able to predict the subtype of any new breast tumor, given a ROI sample 
extracted from its corresponding segmented mammography. 

We will base our design on the VGG‘¢ architecture, since it uses small area 
filters (3x3) that we expect they are well suited for micro-texture prototype 
learning, in contrast to other CNN architectures (e.g. AlexNet) that use larger 
filters (11x11) to look for edges, macro-textures or other salient features of the 
objects.Since our CNN must learn just pixel-wide micro-texture prototypes 
(not the full tumor shapes) of only four classes of cancer subtype, we have 
checked several simplifications of the VGG‘¢ original architecture.Concretely, 
we have defined smaller sets of filters and reduced the number of neuron layers. 


4 Experimental Results 


The experimental data are composed of 179 DICOM mammograms (CC 
and MLO views) distributed in 64, 63, 25 and 27 samples for classes Lumi- 
nal A, Luminal B, Her-2+ and Basal-like, respectively. These medical image 
samples are provided by a Oncology Group in Spain. 

Firstly, we have checked the performance of our model by training and 
validating the network with regards to the first two classes, Luminal A and 
Luminal B, which correspond to the less aggressive cancer subtypes. Our net- 
work has performed really well on Luminal A samples, achieving a 95% of 
accuracy. On the other hand, just 61% of Luminal B samples had been cor- 
rectly classified, while the remaining 39% had been misclassified as belonging 
to Luminal A.Nevertheless, our network renders an overall accuracy around 
78%, which is quite a good result taking into account the evident lack of vi- 
sual patterns in the image samples. The second experiment corresponds to 
the full 4-class classification, i.e., including all breast cancer subtypes. From 
the individual accuracies, we can obtain an overall accuracy as the weighted 
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average with respect to the number of test samples of each class, obtaining a 
fair 67% of good predictions. 


5 Conclusion and Future work 


In this abstract, we have presented a supervised BC molecular subtype 
classification method based on a CNN that analyse manually selected areas 
of breast tumors found in DICOM images of mammograms. To the best of 
our knowledge, this is the first effort to predict the molecular subtypes of 
malignant tumors just from image excerpts of digital mammograms using 
CNNs. Before, we tried other approaches to the same problem using classical 
texture descriptors (Uniform Local Binary Patterns, Histogram of Gradients, 
Gabor filters, Fractal dimension), but with less degree of accuracy ((6]: 75% 
— 52%; [8]: 76% — 56%; current approach: 78% — 67%). Other authors 
have only focused on automatic detection of tumors and determining if the 
tumor is benign or malignant. Future work will aim at validating our approach 
on larger datasets of MRI images, with the ultimate objective of gradually 
bringing computerized assistance to BC molecular subtypes classification into 
clinical practice. 


Acknowledgement. This research has been partly supported by the University Rovira 
i Virgili through Marti-Franques Research Grant and Spanish Government through 
project DPI2016-77415-R. 


References 


1] G. Shieh,,C. Bai, and C. Lee Identify breast cancer subtypes by gene expression 
profiles. Journal of Data Science, 12:165—75, 2004. 


2] C. M. Perou, T. Srlie, M. B. Eisen, M. van de Rijn, S. S. Jeffrey, C. A. Rees 
and Fluge. Molecular portraits of human breast tumours . Nature, 6797: 747-752, 
2000. 


3] N.Harbeck, C. Thomssen, and Gnant Brief preliminary summary of the consensus 
discussion. Breast care, 8(2): 102-109, 2013. 


4] J. Torrents-Barrena , D. Puig, L. Diez-Presa, M. Arenas, and P. Radeva Assess- 
ment of a multidiscriminant supervised classifier driven by textural features to 
distinguish molecular subtypes of breast cancer. In International Conference of 
Computer Assisted Radiology and Surgery (CARS) 10(1): S31-S32, 2015. 


[5] J. Torrents-Barrena, A. Valls, P. Radeva, M. Arenas and D. Puig Automatic 
Recognition of Molecular Subtypes of Breast Cancer in X-Ray images using 
Segmentation-based Fractal Texture Analysis. In 18th International Conference 
of the Catalan Association of Artificial Intelligence (CCIA) 277: 247-256, 2015. 


[6] K. Simonyan, and Zisserman A. Very deep convolutional networks for large-scale 
image recognition. arXiv preprint arXiv, 277: 1409.1556, 2014. 


Computational methods for breast density analysis 


Nasibeh Saffari * 


Department of Computer Engineering and Mathematics, Universitat Rovira i Virgili 
Tarragona, Spain 
nasibeh.saffari@urv.cat 


1 Introduction 


Breast cancer is the most common cause of death among women which is 
prevalent in both developed and developing countries. It is one of the cancers 
which can be diagnosed at early stages and can be prevented if treated ap- 
propriately. Early detection of breast cancer improves the treatment process 
of the patients and increases their chance of survival. Several risk factors for 
breast cancer have been recognized, such as age, family profile, genetics, and 
breast density. The amount of fibro glandular tissue content in the breast as 
estimated mammographically, commonly referred to as breast percent den- 
sity (PD %), is one of the most significant risk factors for developing breast 
cancer [1]. Breast density estimation is used to predict the presence of tu- 
mours at the early stage, which can help both doctors and radiologists to 
plan an appropriate treatment either chemotherapy or radiotherapy. Further- 
more, the breast imaging reporting and data system standard (BIRADS) [2], 
presented by the American College of Cancer, provides the following breast 
density classification: 


e BI-RADSI: Almost entirely fatty breast (0-25%). 
e BI-RADSII: Some fibro glandular tissue (26-50%). 
e BI-RADSIII: Heterogeneously dense breast (51-75%). 
e BI-RADSIV: Extremely dense breast (76-100%). 
Some of the breast density standards classify the breast tissues into fatty, 
glandular or dense. 

Fig. 1 shows examples of breast density tissues classification in mammo- 
grams. 

X-Ray mammography is considered as the most popular methods utilized 
by radiologists for screening and early detection. The common screening mam- 
mographic views are craniocaudal (CC) and mediolateral oblique (MLO). The 
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Fig. 1: Breast density tissues classification in mammograms. 


CC mammographic view is captured from the superior view of a horizontally 
compressed breast. In turn, the MLO view is captured from the side of a 
diagonally compressed breast as illustrated in Fig. 2. 

Analysis of mammograms is not easily feasible for every case. However, 
Computer techniques to diagnose and classify the breast density has attracted 
the researchers attention in recent decade. Hence, we tried to increase the 
accuracy of breast density estimation using multiple feature extraction and 
deep learning methods. Human brain has a unique capability for classification, 
thus scientists are endeavouring to accurately simulate the classification ability 
of the human brain using deep learning methods. 


(a) 


Fig. 2: Different view types in mammograms. (a) MLO view, (b) CC view. 


2 Methodology 


The overview of proposed method is illustrated in Fig. 3. 
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Fig. 3: Preprocessing stage. 


2.1 Preprocessing 
Resize (100x100 ) 


Using full-size image processing, anomaly sample classification leads to in- 
crease the excessive memories and computational cost. To solve this problem, 
we resized the images. 


Remove pectoral muscle 


Fig. 4 shows an example of a mammogram image before and after the 
preprocessing step. We can notice the absence of the pectoral muscle and the 
labels in the processed image. In addition, the estimated breast boundary is 
shown in blue colour. 


2.2 Feature extraction 


Several feature extraction methods have been used, such as histogram ori- 
ented gradients (HOG), local binary patterns (LBP), gabor filter (GF) and 
finally we applied the fusion of HOG, LBP and GF to improve the accuracy. 
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Fig. 4: Preprocessing stage. 


2.3 Classification 


First, we applied variety of supervised learning classification methods, for 
instance support vector machines (SVM), k-nearest neighbours (KNN), fuzzy 
c-means (FCM), radial basis function (RBF) and convolutional neural net- 
works (CNN) on INbreast dataset for creating the model. In the next step, 
we utilized this model on the temporal images,which do not have labels to 
analyze the evolution of breast tissue density and classify it to dense or fatty. 
Finally,we present a novel methodology to create a standard, such as BIRADS 
for classification the breast. density. 
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1 Motivation 


The IT revolution has dramatically changed the infrastructures of the or- 
ganisations and, especially, their information systems. The importance of such 
systems comes determined by the role that they play in the business processes 
executed in the organisations. A business process consists in a set of activ- 
ities aiming at accomplishing a certain organisational goal (e.g. product or 
service) for their customers. Due to the rapidly change of services delivery 
and the fiercely competitive market, business processes need to be controlled, 
implying the appearance of a management field, called Business Process Man- 
agement, focused on optimising the performance of the business processes from 
the organisations taking advantage of models to facilitate the identification of 
bottlenecks. 

One of the most complex sectors within a welfare society is the health- 
care sector. This sector and, consequently, its underlying business processes 
operate under a very dynamic and changeable context. Healthcare-related 
processes have some particular characteristics that rarely happen in processes 
from other sectors. Therefore, healthcare-related information systems play an 
interesting role in the gathering of information and the modelling, analysis 
and optimisation of processes in this particular context. This research fits 
perfectly under the umbrella of the Smart Health paradigm, defined as “the 
provision of health services by using the context-aware network and sensing 
infrastructure of smart cities” [10]. Several contributions have been made to- 
wards smart health: detecting wandering behaviour in people with dementia 
[4] [5], context-aware recommender systems [8] and channel wireless charac- 
terisation in medical scenarios [6] [7]. Combining the management of business 
processes together with the context-awareness that smart health promotes, 
further improvements can be made in an interesting novel research field: pro- 
cess mining. 
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2 Process Mining 


The increasing amount of the data stored in the information systems of the 
organisations is partly due to the records of events, as part of the activities 
performed by the business processes (e.g. the schedule of the visit of a patient 
with a particular doctor, the request of X-ray image or the submission of a 
certain documentation to another department). All this process-related infor- 
mation is usually stored in structured (or semi-structured) files, called event 
log files. Since these files contain all the actions and details of the business 
processes executed, the analysis of such files may determine the correctness of 
the business processes. Nevertheless, event log files might be extremely huge, 
especially in large organisations with thousands of processes. 

Thus, the real challenge is to exploit event data in a meaningful way to 
obtain knowledge about the organisation, provide insights, identify bottle- 
necks, anticipate problems and apply appropriate countermeasures. This fact 
promoted the born of process mining. Process mining is a young research dis- 
cipline that embraces machine learning and data mining techniques on the one 
hand, and process modelling and analysis on the other hand. The main idea 
of process mining is to “discover, monitor and improve real processes (7.e. not 
assumed processes) by extracting knowledge from event logs readily available 
in today’s systems” [1]. 

The state-of-the-art distinguishes between three different types of process 
mining. The first (and most prominent) process mining type is discovery, a 
technique that produces a process model from an event log without using any 
a-priori information. The a-algorithm [3], one of the first process mining algo- 
rithms, aimed at extracting process models (represented as Petri nets) from 
event log files by detecting the relationship between events. The second type 
of process mining is conformance, involving the comparison of an existing 
process model with an event log of the same process. This case is particu- 
larly interesting to verify whether the reality (recorded in a log file) and the 
process model are aligned, and vice versa. The third type of process mining 
is enhancement, aiming at extending or improving an existing process model 
using information about the process recorded in the event log. This kind of 
process mining might be useful in highly dynamic environments, such as the 
healthcare sector, in order to provide up-to-date processes [2]. 


3 Time-efficient distributed computation for Process Mining 


Event log files from organisations might reach large dimensions in short 
periods of time. Hence, the time required for obtaining valuable information 
from the event logs is crucial. Based on this assumption, and taking into 
account the importance of acquiring added-value knowledge rapidly to stand 
out from competitors, distributed and parallel computation may play a key 
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role to reduce runtimes. Furthermore, the use of distributed programming 
models, such as MapReduce [9], can facilitate the processing and analysis 
of large event log files. At this point, the key point is to design a parallel 
process mining algorithm able to extract meaningful knowledge. Accordingly, 
this could be possible only if events from log files can be analysed in parallel. 

The first step consists in transforming event log data from a (semi-) struc- 
tured file into a matrix M"*?, where n represents the number of events (rows) 
and p the number of characteristics (columns). After some transformations, 
matrix M may be similar to Equation 1, where three kinds of characteristics 
can be found: (1) the temporal characteristic f, in which the events are ordered 
ascendantly (from the oldest to the newest), (2) the event characteristic € for 
determining the relationships between events (e.g. the identifier of a certain 
activity), and (3) the grouping characteristic @ for generating submatrices M?¢. 
Grouping the events that are related between them into the same submatrix 
M¢® opens the possibility of a parallel analysis of the relationships between 
events, in which each node from a distributed computing system evaluates a 
part of the events (M¢). 


a1 |M11 +++ M1 p—2/e1]t1 
M=] if i 0%, : 3 (Me (1) 
An|Mn J +++ Mn p—2/en|tn 


In this preliminary algorithm, the relationship between events is deter- 
mined by a frequency approximation, providing knowledge on how events 
happen. This frequency-based approach determines the relative frequencies 
of succession between events represented by empirical probabilities, such that 
given an event e;, the next event is e; (in other words, compute P(e;|e;)). Once 
the algorithm concludes, a visual representation of such frequencies could be 
depicted as a graph G = (V,F), where V represents the events from € as 
vertices, and & represents the binary relations of the elements in V with a 
certain frequency or probability. 


4 Conclusions and Future Work 


Although process mining is a young research field, several contributions 
have been performed during the recent years. The need to add value to the 
products and services that organisations offer leads to the analysis of the event 
log data recorded in their information systems. The large dimensions that 
such files may achieve, conduct to the use of new techniques and program- 
ming paradigms that minimise the computational cost and time. Applying 
distributed and parallel models to process mining algorithms might facilitate 
the early extraction of knowledge. 

Future research will extend the algorithm presented in this paper, as well as 
validating its robustness and efficiency on process discovery using distributed 
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computing systems. Moreover, this system aims to be implemented in the 
Business Intelligence tool of an organisation from the healthcare sector. 
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1 Introduction 


We present a learning method to automatically deduce a penalty cost to 
each edit operation based on a ground truth graph matching. It uses a linear 
classifier applied to a bi-dimensional space in which each element represents 
a node-to-node mapping. The computational cost is cubic with respect to the 
number of pairs of graphs. Moreover, the algorithm does not need to compute 
any graph matching, which is the main obstacle of other methods due to 
its intrinsic exponential computational complexity. Experimental validation 
shows that the matching accuracy and the recognition ration of this method 
outperforms the current methods. Furthermore, there is a significant reduction 
in the runtime in the learning process. 


2 Learning Graph Edit Distance 


In [2], the human (or another system) interacts with the automatically 
obtained matching between nodes and imposes a new node-to-node mapping. 
Then, the method considers this new mapping and updates the whole match- 
ing. The method presented in [5] is the only one in the literature that learns 
the insertion and deletion edit cost on nodes and edges as constants (a Real 
number). These costs can be directly applied as the input parameters of the 
matching algorithms such as [1], [4] and [3]. It is an optimisation method 
that aims to minimise the distance between the edit cost obtained by the 
ground truth matching and the edit cost obtained by the current matching. 
Our method is compared to this one in the experimental section. 
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3 Defining two coordinate systems 


Suppose that we have two bi-dimensional coordinate systems defined by 
axes (#,y) and (%, y). Moreover, suppose that a substitution (first option) of 
node v? by node v4 or a deletion of node v? (second option), (which is the same 
as substituting v; to a null-node v/) are represented by a point (4(,;),%(i,;)) 
in the coordinate system (a, y). Similarly, suppose that a substitution (first 
option) of node v? by node vy or a insertion of node v# (third option), (which 
is the same as substituting a null-node v? to v?) are represented by a point 
(Z@,3),9q@,j)) in the coordinate system (7, ¥). 

Figure 1 schematically shows the position of mappings in the first option (S: 
substitute), the second option (D: deletion) and the third option (I: insertion). 
We also show the borders C(i, 7) —C(i,¢) = 0 in the (4, y)system and C(i, j) — 
Ce, j) = 0 in the (%, 7) system. Our learning method is based on finding each 
border in the coordinate systems (a, y) and (#,¥) assuming these borders are 
a line. Then, parameters of K, and K, are deduced through the offset and 
slope of these lines. In the next section, we show how we define the coordinate 


systems (#, y) and (%,¥) and then how the values of Kk, and K, are modelled. 


Fig. 1: Positions of mappings in the three options (S): Substitution, (D): 
deletion and (J): insertion and also two plausible borders between both sets 
given the coordinate systems (a, y) and (%, ij). 


3.1 Learning K, and Ke 


Since the border between elements of the first option(2, y) is CG,;)-—C(ic) = 
0, it is already demonstrated that the cost of edit two stars is as below: 


e Substituting two stars: Cj; 3) = Cults, v4) + gs (Kat Ke) Cae. v4) 
e Deleting a star: Cy2) = Ky +nyp(Ky + Ke) 
e Inserting a star:C(<;) = Ky + nya(Kv + Ke) 

J 
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So 
Cos(vf vf) + Tay - (Ky + Ke) + Cos(v?, vf) — (Ku + nye - (Ky + Ke)) =0 
Rearranging the terms we arrive at the expression, 


Cos(v? FD) ei Ca, vf) (Tij — ny?) 


—K.- LI 1 


Thus, given the mapping of node v} to node v4, which represents a substi- 
tution (both nodes are non-nulls) or it represents a deletion (the first one is 
non-null and the second one is a null node) then this mapping can be repre- 


sented as a point in the coordinate system («, y) as follows, 


Cus(vP, vf) + Cys (v?, v9) 


y= Z 2 

Y9) NyP +1— iy ( ) 
nyp — Ti;) 

2 5 — & 3 

X(i,j) NP ae 1 ( ) 


In this way, considering equations 1, 2 and 3, the border between mappings 
in the first and the second option can be approximated as the line in the 
coordinate system («, ¥) 

Y= Kex oe i 
Similarly, the coordinate system (i, 4) is defined considering mappings in 


the first and third option. Thus the border between these options is 


y= Ker+ Ky 


¥=K, -X4+K, 


a C a 


an aH4--- A)*D) 


Xij x 


Fig. 2: Representation of the star substitution, deletion and insertion of the 
example in systems (#,y) and (%, i). 
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From the practical point of view, our method has three main advantages 
with respect to the one in the literature. First, it does not have parameters 
to be tuned, so as a weighting parameter to gauge the regularisation term. 
Second, it is not necessary to impose initial values of K, and K,. And third, 
the learning runtime is really much faster. We could consider as a drawback the 
need of having a ground truth correspondence. Nevertheless, in our databases, 
we have seen that the learned costs were almost the same with few samples 
than the whole learning set. This is because both classes in the domains 
(z,y) and (%, y) were nicely clustered. Only in the Rotation-Zoom database, 
classes were mixed up when the substitution weights w! were not learned. 
Nevertheless, this problem was partially solved when the learned weights were 
applied. 
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